Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make alerting rules resilient to scrape failures #3018

Merged

Conversation

simonpasquier
Copy link
Contributor

Description of the change:

This change tightens the alerting rules to avoid resetting the alerts upon transient scrape failures.

It also removes the message annotation in favor of the description annotation which is more commonly used by the Prometheus community.

Motivation for the change:

More resilient alerting.

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

This change tightens the alerting rules to avoid resetting the alerts
upon transient scrape failures.

It also removes the `message` annotation in favor of the `description`
annotation which is more commonly used by the Prometheus community.

Signed-off-by: Simon Pasquier <[email protected]>
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 24, 2023
@openshift-ci
Copy link

openshift-ci bot commented Aug 24, 2023

Hi @simonpasquier. Thanks for your PR.

I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from anik120 and kevinrizza August 24, 2023 07:35
@perdasilva
Copy link
Collaborator

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 24, 2023
Copy link
Collaborator

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@openshift-ci
Copy link

openshift-ci bot commented Aug 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 24, 2023
@simonpasquier
Copy link
Contributor Author

@perdasilva can you comment with /lgtm? Your previous comment missed the /.

@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2023
@grokspawn grokspawn added this pull request to the merge queue Oct 20, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 20, 2023
@simonpasquier
Copy link
Contributor Author

/retest

@tmshort tmshort added this pull request to the merge queue Oct 25, 2023
Merged via the queue into operator-framework:master with commit cda76ee Oct 25, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants